AITopics | subtask 1

Collaborating Authors

subtask 1

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Select-Then-Decompose: From Empirical Analysis to Adaptive Selection Strategy for Task Decomposition in Large Language Models

Liu, Shuodi, Liu, Yingzhuo, Wang, Zi, Wang, Yusheng, Wu, Huijia, Xiang, Liuyu, He, Zhaofeng

arXiv.org Artificial IntelligenceOct-22-2025

Large language models (LLMs) have demonstrated remarkable reasoning and planning capabilities, driving extensive research into task decomposition. Existing task decomposition methods focus primarily on memory, tool usage, and feedback mechanisms, achieving notable success in specific domains, but they often overlook the trade-off between performance and cost. In this study, we first conduct a comprehensive investigation on task decomposition, identifying six categorization schemes. Then, we perform an empirical analysis of three factors that influence the performance and cost of task decomposition: categories of approaches, characteristics of tasks, and configuration of decomposition and execution models, uncovering three critical insights and summarizing a set of practical principles. Building on this analysis, we propose the Select-Then-Decompose strategy, which establishes a closed-loop problem-solving process composed of three stages: selection, execution, and verification. This strategy dynamically selects the most suitable decomposition approach based on task characteristics and enhances the reliability of the results through a verification module. Comprehensive evaluations across multiple benchmarks show that the Select-Then-Decompose consistently lies on the Pareto frontier, demonstrating an optimal balance between performance and cost. Our code is publicly available at https://github.com/summervvind/Select-Then-Decompose.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.17922

Country:

Europe > United Kingdom (0.04)
North America > United States (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(6 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.68)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

mdok of KInIT: Robustly Fine-tuned LLM for Binary and Multiclass AI-Generated Text Detection

Macko, Dominik

arXiv.org Artificial IntelligenceSep-25-2025

The large language models (LLMs) are able to generate high-quality texts in multiple languages. Such texts are often not recognizable by humans as generated, and therefore present a potential of LLMs for misuse (e.g., plagiarism, spams, disinformation spreading). An automated detection is able to assist humans to indicate the machine-generated texts; however, its robustness to out-of-distribution data is still challenging. This notebook describes our mdok approach in robust detection, based on fine-tuning smaller LLMs for text classification. It is applied to both subtasks of Voight-Kampff Generative AI Detection 2025, providing remarkable performance (1st rank) in both, the binary detection as well as the multiclass classification of various cases of human-AI collaboration.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.01702

Country:

Europe (1.00)
Asia (0.94)
North America > United States (0.46)
North America > Mexico (0.28)

Genre: Research Report (0.52)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

AIxcellent Vibes at GermEval 2025 Shared Task on Candy Speech Detection: Improving Model Performance by Span-Level Training

Thelen, Christian Rene, Blaneck, Patrick Gustav, Bornheim, Tobias, Grieger, Niklas, Bialonski, Stephan

arXiv.org Artificial IntelligenceSep-17-2025

Positive, supportive online communication in social media (candy speech) has the potential to foster civility, yet automated detection of such language remains underexplored, limiting systematic analysis of its impact. We investigate how candy speech can be reliably detected in a 46k-comment German YouTube corpus by monolingual and multilingual language models, including GBERT, Qwen3 Embedding, and XLM-RoBERTa. We find that a multilingual XLM-RoBERTa-Large model trained to detect candy speech at the span level outperforms other approaches, ranking first in both binary positive F1: 0.8906) and categorized span-based detection (strict F1: 0.6307) subtasks at the GermEval 2025 Shared Task on Candy Speech Detection. We speculate that span-based training, multilingual capabilities, and emoji-aware tokenizers improved detection performance. Our results demonstrate the effectiveness of multilingual models in identifying positive, supportive language.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.07459

Country:

Europe > Germany (0.70)
North America > United States (0.68)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Communications > Social Media (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)

Add feedback

AraHealthQA 2025: The First Shared Task on Arabic Health Question Answering

Alhuzali, Hassan, Al-Eisawi, Walid, Abdul-Mageed, Muhammad, Abouzahir, Chaimae, Abu-Daoud, Mouath, Alasmari, Ashwag, Al-Monef, Renad, Alqahtani, Ali, Ayash, Lama, Kharouf, Leen, Shamout, Farah E., Habash, Nizar

arXiv.org Artificial IntelligenceSep-16-2025

We introduce AraHealthQA 2025, the Comprehensive Arabic Health Question Answering Shared Task, held in conjunction with ArabicNLP 2025 (co-located with EMNLP 2025). This shared task addresses the paucity of high-quality Arabic medical QA resources by offering two complementary tracks: MentalQA, focusing on Arabic mental health Q&A (e.g., anxiety, depression, stigma reduction), and MedArabiQ, covering broader medical domains such as internal medicine, pediatrics, and clinical decision making. Each track comprises multiple subtasks, evaluation datasets, and standardized metrics, facilitating fair benchmarking. The task was structured to promote modeling under realistic, multilingual, and culturally nuanced healthcare contexts. We outline the dataset creation, task design and evaluation framework, participation statistics, baseline systems, and summarize the overall outcomes. We conclude with reflections on the performance trends observed and prospects for future iterations in Arabic health QA.

large language model, machine learning, question answering, (17 more...)

arXiv.org Artificial Intelligence

2508.20047

Country:

Asia > Middle East (0.46)
Europe (0.46)
Africa > Middle East > Egypt (0.15)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.72)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

From Detection to Mitigation: Addressing Gender Bias in Chinese Texts via Efficient Tuning and Voting-Based Rebalancing

Wu, Chengyan, Cai, Yiqiang, Cheng, Yufei, Xue, Yun

arXiv.org Artificial IntelligenceSep-10-2025

This paper presents our team's solution to Shared Task 7 of NLPCC-2025, which focuses on sentence-level gender bias detection and mitigation in Chinese. The task aims to promote fairness and con-trollability in natural language generation by automatically detecting, classifying, and mitigating gender bias. To address this challenge, we adopt a fine-tuning approach based on large language models (LLMs), efficiently adapt to the bias detection task via Low-Rank Adaptation (LoRA). In terms of data processing, we construct a more balanced training set to alleviate class imbalance and introduce heterogeneous samples from multiple sources to enhance model generalization. For the detection and classification sub-tasks, we employ a majority voting strategy that integrates outputs from multiple expert models to boost performance. Additionally, to improve bias generation detection and mitigation, we design a multi-temperature sampling mechanism to capture potential variations in bias expression styles. Experimental results demonstrate the effectiveness of our approach in bias detection, classification, and mitigation. Our method ultimately achieves an average score of 47.90%, ranking fourth in the shared task.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2509.07889

Country: Asia > China (0.15)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.71)

Add feedback

PalmX 2025: The First Shared Task on Benchmarking LLMs on Arabic and Islamic Culture

Alwajih, Fakhraddin, Mekki, Abdellah El, Mubarak, Hamdy, Hawasly, Majd, Mohamed, Abubakr, Abdul-Mageed, Muhammad

arXiv.org Artificial IntelligenceSep-3-2025

Large Language Models (LLMs) inherently reflect the vast data distributions they encounter during their pre-training phase. As this data is predominantly sourced from the web, there is a high chance it will be skewed towards high-resourced languages and cultures, such as those of the West. Consequently, LLMs often exhibit a diminished understanding of certain communities, a gap that is particularly evident in their knowledge of Arabic and Islamic cultures. This issue becomes even more pronounced with increasingly under-represented topics. To address this critical challenge, we introduce PalmX 2025, the first shared task designed to benchmark the cultural competence of LLMs in these specific domains. The task is composed of two subtasks featuring multiple-choice questions (MCQs) in Modern Standard Arabic (MSA): General Arabic Culture and General Islamic Culture. These subtasks cover a wide range of topics, including traditions, food, history, religious practices, and language expressions from across 22 Arab countries. The initiative drew considerable interest, with 26 teams registering for Subtask 1 and 19 for Subtask 2, culminating in nine and six valid submissions, respectively. Our findings reveal that task-specific fine-tuning substantially boosts performance over baseline models. The top-performing systems achieved an accuracy of 72.15% on cultural questions and 84.22% on Islamic knowledge. Parameter-efficient fine-tuning emerged as the predominant and most effective approach among participants, while the utility of data augmentation was found to be domain-dependent.

computational linguistic, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.0255

Country:

North America (1.00)
Africa > Middle East (1.00)
Asia > Middle East > UAE (0.28)

Genre: Research Report (0.84)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

A Multi-Strategy Approach for AI-Generated Text Detection

Zain, Ali, Farooqui, Sareem, Rafi, Muhammad

arXiv.org Artificial IntelligenceSep-3-2025

This paper presents presents three distinct systems developed for the M-DAIGT shared task on detecting AI generated content in news articles and academic abstracts. The systems includes: (1) A fine-tuned RoBERTa-base classifier, (2) A classical TF-IDF + Support Vector Machine (SVM) classifier , and (3) An Innovative ensemble model named Candace, leveraging probabilistic features extracted from multiple Llama-3.2 models processed by a customTransformer encoder.The RoBERTa-based system emerged as the most performant, achieving near-perfect results on both development and test sets.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.00623

Country:

Asia (0.29)
North America > Mexico > Mexico City (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.90)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.78)

Add feedback

Medico 2025: Visual Question Answering for Gastrointestinal Imaging

Gautam, Sushant, Thambawita, Vajira, Riegler, Michael, Halvorsen, Pål, Hicks, Steven

arXiv.org Artificial IntelligenceAug-15-2025

The Medico 2025 challenge addresses Visual Question Answering (VQA) for Gastrointestinal (GI) imaging, organized as part of the MediaEval task series. The challenge focuses on developing Explainable Artificial Intelligence (XAI) models that answer clinically relevant questions based on GI endoscopy images while providing interpretable justifications aligned with medical reasoning. It introduces two subtasks: (1) answering diverse types of visual questions using the Kvasir-VQA-x1 dataset, and (2) generating multimodal explanations to support clinical decision-making. The Kvasir-VQA-x1 dataset, created from 6,500 images and 159,549 complex question-answer (QA) pairs, serves as the benchmark for the challenge. By combining quantitative performance metrics and expert-reviewed explainability assessments, this task aims to advance trustworthy Artificial Intelligence (AI) in medical image analysis. Instructions, data access, and an updated guide for participation are available in the official competition repository: https://github.com/simula/MediaEval-Medico-2025

medico 2025, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

2508.10869

Country:

Europe > Norway (0.17)
Europe > Czechia (0.14)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.93)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.89)

Add feedback

LyS at SemEval 2025 Task 8: Zero-Shot Code Generation for Tabular QA

Gude, Adrián, Santos-Ríos, Roi, Prado-Valiño, Francisco, Ezquerro, Ana, Vilares, Jesús

arXiv.org Artificial IntelligenceAug-13-2025

This paper describes our participation in SemEval 2025 Task 8, focused on Tabular Question Answering. We developed a zero-shot pipeline that leverages an Large Language Model to generate functional code capable of extracting the relevant information from tabular data based on an input question. Our approach consists of a modular pipeline where the main code generator module is supported by additional components that identify the most relevant columns and analyze their data types to improve extraction accuracy. In the event that the generated code fails, an iterative refinement process is triggered, incorporating the error feedback into a new generation prompt to enhance robustness. Our results show that zero-shot code generation is a valid approach for Tabular QA, achieving rank 33 of 53 in the test phase despite the lack of task-specific fine-tuning.

computational linguistic, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.09012

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(6 more...)

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

FHSTP@EXIST 2025 Benchmark: Sexism Detection with Transparent Speech Concept Bottleneck Models

Labadie-Tamayo, Roberto, Böck, Adrian Jaques, Slijepčević, Djordje, Chen, Xihui, Babic, Andreas, Zeppelzauer, Matthias

arXiv.org Artificial IntelligenceJul-29-2025

Sexism has become widespread on social media and in online conversation. To help address this issue, the fifth Sexism Identification in Social Networks (EXIST) challenge is initiated at CLEF 2025. Among this year's international benchmarks, we concentrate on solving the first task aiming to identify and classify sexism in social media textual posts. In this paper, we describe our solutions and report results for three subtasks: Subtask 1.1 - Sexism Identification in Tweets, Subtask 1.2 - Source Intention in Tweets, and Subtask 1.3 - Sexism Categorization in Tweets. We implement three models to address each subtask which constitute three individual runs: Speech Concept Bottleneck Model (SCBM), Speech Concept Bottleneck Model with Transformer (SCBMT), and a fine-tuned XLM-RoBERTa transformer model. SCBM uses descriptive adjectives as human-interpretable bottleneck concepts. SCBM leverages large language models (LLMs) to encode input texts into a human-interpretable representation of adjectives, then used to train a lightweight classifier for downstream tasks. SCBMT extends SCBM by fusing adjective-based representation with contextual embeddings from transformers to balance interpretability and classification performance. Beyond competitive results, these two models offer fine-grained explanations at both instance (local) and class (global) levels. We also investigate how additional metadata, e.g., annotators' demographic profiles, can be leveraged. For Subtask 1.1, XLM-RoBERTa, fine-tuned on provided data augmented with prior datasets, ranks 6th for English and Spanish and 4th for English in the Soft-Soft evaluation. Our SCBMT achieves 7th for English and Spanish and 6th for Spanish.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.20924

Country: Europe > Spain (0.28)

Genre: Research Report (0.50)

Industry:

Information Technology (0.49)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.48)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback